Building a Large-Scale Knowledge Base for Machine Translation
نویسندگان
چکیده
Introduction Linguistic Resources Knowledge-based machine translation (KBMT) systems have achieved excellent results in constrained domains, but have not yet scaled up to newspaper text. The reason is that knowledge resources (lexicons, grammar rules, world models) must be painstakingly handcrafted from scratch. One of the hypotheses being tested in the PANGLOSS machine translation project is whether or not these resources can be semi-automatically acquired on a very large scale. This paper focuses on the construction of a large ontology (or knowledge base, or world model) for supporting KBMT. It contains representations for some 70,000 commonly encountered objects, processes, qualities, and relations. The ontology was constructed by merging various online dictionaries, semantic networks, and bilingual resources, through semi-automatic methods. Some of these methods (e.g., conceptual matching of semantic taxonomies) are broadly applicable to problems of importing/exporting knowledge from one KB to another. Other methods (e.g., bilingual matching) allow a knowledge engineer to build up an index to a KB in a second language, such as Spanish or Japanese. USC/Information Sciences Institute 4676 Admiralty Way Marina del Rey, CA 90292 knight,luk @isi.edu
منابع مشابه
Building a Large-Scale Commonsense Knowledge Base by Converting an Existing One in a Different Language
This paper describes our effort to build a large-scale commonsense knowledge base in Korean by converting a pre-existing one in English, called ConceptNet. The English commonsense knowledge base is essentially a huge net consisting of concepts and relations. Triplets in the form of ConceptRelation-Concept in the net were extracted from English sentences collected from volunteers through a Web s...
متن کاملBuilding A Large Ontology For Machine Translation
This paper describes efforts underway to construct a largescale ontology to support semantic processing in the PANGLOSS knowledge-base machine translation system. Because we axe aiming at broad sem~tntic coverage, we are focusing on automatic and semi-automatic methods of knowledge acquisition. Here we report on algorithms for merging complementary online resources, in particular the LDOCE and ...
متن کاملRule base combined linguistics knowledge with corpus
This paper proposes a new approach to construction of rule bases for the transferredbased machine translation. In our approach, the rule bases are constructed in combination of the linguistics knowledge and large scale of corpora. On the one hand the lexical knowledge, the syntactic knowledge and the semantic knowledge are all used in the rules. on the other hand the knowledge is used for the s...
متن کاملAutomatic extraction of facts, relations, and entities for web-scale knowledge base population
Equipping machines with knowledge, through the construction of machinereadable knowledge bases, presents a key asset for semantic search, machine translation, question answering, and other formidable challenges in artificial intelligence. However, human knowledge predominantly resides in books and other natural language text forms. This means that knowledge bases must be extracted and synthesiz...
متن کاملMultiple Strategies for Automatic Disambiguation in Technical Translation
The use of knowledge-based machine translation with controlled technical text can produce high-quality translations. However, building and maintaining knowledge bases can require significant time and effort, since they typically involve handcoding of semantic preferences. When a system can't disambiguate based on semantic preferences, it can initiate interactive disambiguation with the author t...
متن کامل